Search CORE

4 research outputs found

Large databases of real and synthetic images for feature evaluation and prediction

Author: Kaneva Biliana K
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2012
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 157-167).Image features are widely used in computer vision applications from stereo matching to panorama stitching to object and scene recognition. They exploit image regularities to capture structure in images both locally, using a patch around an interest point, and globally, over the entire image. Image features need to be distinctive and robust toward variations in scene content, camera viewpoint and illumination conditions. Common tasks are matching local features across images and finding semantically meaningful matches amongst a large set of images. If there is enough structure or regularity in the images, we should be able not only to find good matches but also to predict parts of the objects or the scene that were not directly captured by the camera. One of the difficulties in evaluating the performance of image features in both the prediction and matching tasks is the availability of ground truth data. In this dissertation, we take two different approaches. First, we propose using a photorealistic virtual world for evaluating local feature descriptors and leaning new feature detectors. Acquiring ground truth data and, in particular pixel to pixel correspondences between images, in complex 3D scenes under different viewpoint and illumination conditions in a controlled way is nearly impossible in a real world setting. Instead, we use a high-resolution 3D model of a city to gain complete and repeatable control of the environment. We calibrate our virtual world evaluations by comparing against feature rankings made from photographic data of the same subject matter (the Statue of Liberty). We then use our virtual world to study the effects on descriptor performance of controlled changes in viewpoint and illumination. We further employ machine learning techniques to train a model that would recognize visually rich interest points and optimize the performance of a given descriptor. In the latter part of the thesis, we take advantage of the large amounts of image data available on the Internet to explore the regularities in outdoor scenes and, more specifically, the matching and prediction tasks in street level images. Generally, people are very adept at predicting what they might encounter as they navigate through the world. They use all of their prior experience to make such predictions even when placed in unfamiliar environment. We propose a system that can predict what lies just beyond the boundaries of the image using a large photo collection of images of the same class, but not from the same location in the real world. We evaluate the performance of the system using different global or quantized densely extracted local features. We demonstrate how to build seamless transitions between the query and prediction images, thus creating a photorealistic virtual space from real world images.by Biliana K. Kaneva.Ph.D

DSpace@MIT

Evaluation of image features using a photorealistic virtual world

Author: Freeman William T.
Kaneva Biliana K.
Torralba Antonio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2011
Field of study

Image features are widely used in computer vision applications. They need to be robust to scene changes and image transformations. Designing and comparing feature descriptors requires the ability to evaluate their performance with respect to those transformations. We want to know how robust the descriptors are to changes in the lighting, scene, or viewing conditions. For this, we need ground truth data of different scenes viewed under different camera or lighting conditions in a controlled way. Such data is very difficult to gather in a real-world setting. We propose using a photorealistic virtual world to gain complete and repeatable control of the environment in order to evaluate image features. We calibrate our virtual world evaluations by comparing against feature rankings made from photographic data of the same subject matter (the Statue of Liberty). We find very similar feature rankings between the two datasets. We then use our virtual world to study the effects on descriptor performance of controlled changes in viewpoint and illumination. We also study the effect of augmenting the descriptors with depth information to improve performance.Quanta Computer (Firm)Shell ResearchUnited States. Office of Naval Research. Multidisciplinary University Research Initiative (Grant N00014-06-1-0734)United States. Office of Naval Research. Multidisciplinary University Research Initiative. CAREER (Award Number 0747120)United States. Office of Naval Research. Multidisciplinary University Research Initiative (Grant N000141010933)Microsoft CorporationAdobe SystemsGoogle (Firm

CiteSeerX

DSpace@MIT

Crossref

Matching and Predicting Street Level Images

Author: Avidan Shai
Freeman William T.
Kaneva Biliana K.
Sivic Josef
Torralba Antonio
Publication venue
Publication date: 01/01/2010
Field of study

The paradigm of matching images to a very large dataset has been used for numerous vision tasks and is a powerful one. If the image dataset is large enough, one can expect to nd good matches of almost any image to the database, allowing label transfer [3, 15], and image editing or enhancement [6, 11]. Users of this approach will want to know how many images are required, and what features to use for nding semantic relevant matches. Furthermore, for navigation tasks or to exploit context, users will want to know the predictive quality of the dataset: can we predict the image that would be seen under changes in camera position? We address these questions in detail for one category of images: street level views. We have a dataset of images taken from an enumeration of positions and viewpoints within Pittsburgh.We evaluate how well we can match those images, using images from non-Pittsburgh cities, and how well we can predict the images that would be seen under changes in cam- era position. We compare performance for these tasks for eight di erent feature sets, nding a feature set that outperforms the others (HOG). A combination of all the features performs better in the prediction task than any individual feature. We used Amazon Mechanical Turk workers to rank the matches and predictions of di erent algorithm conditions by comparing each one to the selection of a random image. This approach can evaluate the e cacy of di erent feature sets and parameter settings for the matching paradigm with other image categories.United States. Dept. of Defense (ARDA VACE)United States. National Geospatial-Intelligence Agency (NEGI-1582-04- 0004)United States. National Geospatial-Intelligence Agency (MURI Grant N00014-06-1-0734)France. Agence nationale de la recherche (project HFIBMR (ANR-07-BLAN- 0331-01))Institut national de recherche en informatique et en automatique (France)Xerox Fellowship Progra

CiteSeerX

DSpace@MIT

Infinite Images: Creating and Exploring a Large Photorealistic Virtual Space

Author: Avidan Shai
Freeman William T.
Kaneva Biliana K.
Sivic Josef
Torralba Antonio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2009
Field of study

We present a system for generating “infinite” images from large collections of photos by means of transformed image retrieval. Given a query image, we first transform it to simulate how it would look if the camera moved sideways and then perform image retrieval based on the transformed image. We then blend the query and retrieved images to create a larger panorama. Repeating this process will produce an “infinite” image. The transformed image retrieval model is not limited to simple 2-D left/right image translation, however, and we show how to approximate other camera motions like rotation and forward motion/zoom-in using simple 2-D image transforms. We represent images in the database as a graph where each node is an image and different types of edges correspond to different types of geometric transformations simulating different camera motions. Generating infinite images is thus reduced to following paths in the image graph. Given this data structure we can also generate a panorama that connects two query images, simply by finding the shortest path between the two in the image graph. We call this option the “image taxi.” Our approach does not assume photographs are of a single real 3-D location, nor that they were taken at the same time. Instead, we organize the photos in themes, such as city streets or skylines and synthesize new virtual scenes by combining images from distinct but visually similar locations. There are a number of potential applications to this technology. It can be used to generate long panoramas as well as content aware transitions between reference images or video shots. Finally, the image graph allows users to interactively explore large photo collections for ideation, games, social interaction, and artistic purposes.United States. Army Research Office. Multidisciplinary University Research Initiative (Grant Number N00014-06-1- 0734)French National Research Agency (ANR) (project HFIBMR) (ANR-07-BLAN-0331-01)United States. National Geospatial-Intelligence Agency (NEGI-1582-04-0004

DSpace@MIT